Goto

Collaborating Authors

 primary school


Evaluating Large Language Model with Knowledge Oriented Language Specific Simple Question Answering

arXiv.org Artificial Intelligence

We introduce KoLasSimpleQA, the first benchmark evaluating the multilingual factual ability of Large Language Models (LLMs). Inspired by existing research, we created the question set with features such as single knowledge point coverage, absolute objectivity, unique answers, and temporal stability. These questions enable efficient evaluation using the LLM-as-judge paradigm, testing both the LLMs' factual memory and self-awareness ("know what they don't know"). KoLasSimpleQA expands existing research in two key dimensions: (1) Breadth (Multilingual Coverage): It includes 9 languages, supporting global applicability evaluation. (2) Depth (Dual Domain Design): It covers both the general domain (global facts) and the language-specific domain (such as history, culture, and regional traditions) for a comprehensive assessment of multilingual capabilities. We evaluated mainstream LLMs, including traditional LLM and emerging Large Reasoning Models. Results show significant performance differences between the two domains, particularly in performance metrics, ranking, calibration, and robustness. This highlights the need for targeted evaluation and optimization in multilingual contexts. We hope KoLasSimpleQA will help the research community better identify LLM capability boundaries in multilingual contexts and provide guidance for model optimization. We will release KoLasSimpleQA at https://github.com/opendatalab/KoLasSimpleQA .


Complete Approximations of Incomplete Queries

arXiv.org Artificial Intelligence

This paper studies the completeness of conjunctive queries over a partially complete database and the approximation of incomplete queries. Given a query and a set of completeness rules (a special kind of tuple generating dependencies) that specify which parts of the database are complete, we investigate whether the query can be fully answered, as if all data were available. If not, we explore reformulating the query into either Maximal Complete Specializations (MCSs) or the (unique up to equivalence) Minimal Complete Generalization (MCG) that can be fully answered, that is, the best complete approximations of the query from below or above in the sense of query containment. We show that the MSG can be characterized as the least fixed-point of a monotonic operator in a preorder. Then, we show that an MCS can be computed by recursive backward application of completeness rules. We study the complexity of both problems and discuss implementation techniques that rely on an ASP and Prolog engines, respectively.


We-Math: Does Your Large Multimodal Model Achieve Human-like Mathematical Reasoning?

arXiv.org Artificial Intelligence

Visual mathematical reasoning, as a fundamental visual reasoning ability, has received widespread attention from the Large Multimodal Models (LMMs) community. Existing benchmarks, such as MathVista and MathVerse, focus more on the result-oriented performance but neglect the underlying principles in knowledge acquisition and generalization. Inspired by human-like mathematical reasoning, we introduce WE-MATH, the first benchmark specifically designed to explore the problem-solving principles beyond end-to-end performance. We meticulously collect and categorize 6.5K visual math problems, spanning 67 hierarchical knowledge concepts and five layers of knowledge granularity. We decompose composite problems into sub-problems according to the required knowledge concepts and introduce a novel four-dimensional metric, namely Insufficient Knowledge (IK), Inadequate Generalization (IG), Complete Mastery (CM), and Rote Memorization (RM), to hierarchically assess inherent issues in LMMs' reasoning process. With WE-MATH, we conduct a thorough evaluation of existing LMMs in visual mathematical reasoning and reveal a negative correlation between solving steps and problem-specific performance. We confirm the IK issue of LMMs can be effectively improved via knowledge augmentation strategies. More notably, the primary challenge of GPT-4o has significantly transitioned from IK to IG, establishing it as the first LMM advancing towards the knowledge generalization stage. In contrast, other LMMs exhibit a marked inclination towards Rote Memorization - they correctly solve composite problems involving multiple knowledge concepts yet fail to answer sub-problems. We anticipate that WE-MATH will open new pathways for advancements in visual mathematical reasoning for LMMs. The WE-MATH data and evaluation code are available at https://github.com/We-Math/We-Math.


Mental state attribution to educational robots: an experience with children in primary school

arXiv.org Artificial Intelligence

The work presented in this paper was carried out in the context of the project Girls and boys: one day at university promoted by the City of Turin together with the University of Turin. We were responsible for two educational activities on robotics and coding hosted at the Computer Science Department, which made one of its laboratories available for this kind of lesson. At the conclusion of the lab's sessions, children compiled the Attribution of Mental State (AMS) questionnaire, which is a measure of mental states that participants attribute to robots, namely the user's perception of the robot's mental qualities as compared to humans. We distributed the questionnaires both to children attending the educational robotics lab and to children performing coding activities. Results show that the first group attributed higher mental qualities to the robots, compared to the attribution given by children that did not have a direct experience with a robot.


How We Learned To Break Down Barriers To Machine Learning - AI Summary

#artificialintelligence

This article is the first in a short series of pieces that will recap each of the day's talks for the benefit of those who weren't able to travel to DC for our first conference. Dr. Sephus came to AWS via a roundabout path, growing up in Mississippi before eventually joining a tech startup called Partpic. When asked, she identified access as the biggest barrier to the greater use of AI/ML--in a lot of ways, it's another wrinkle in the old problem of the digital divide. A core component of being able to utilize most common AI/ML tools is having reliable and fast Internet access, and drawing on experience from her background, Dr. Sephus pointed out that a lack of access to technology in primary schools in poorer areas of the country sets kids on a path away from being able to use the kinds of tools we're talking about. Dr. Sephus said that AWS has been hiring sociologists and psychologists to join its tech teams to figure out ways to tackle the digital divide by meeting people where they are rather than forcing them to come to the technology.


Spatially weighted averages in R with sf

#artificialintelligence

Spatial joins allow to augment one spatial dataset with information from another spatial dataset by linking overlapping features. In this post I will provide an example showing how to augment a dataset containing school locations with socioeconomic data of their surrounding statistical region using R and the package sf (Pebesma 2018). This approach has the drawback that the surrounding statistical region doesn't reflect the actual catchment area of the school. I will present an alternative approach where the overlaps of the schools' catchment areas with the statistical regions allow to calculate the weighted average of the socioeconomic statistics. If we have no data about the actual catchment areas of the schools, we may resort to approximating these areas as circular regions or as Voronoi regions around schools.


Primary school to use AI to monitor students

#artificialintelligence

Students who graduated from kindergarten welcome their up-coming study from September at the Minhang Qiangwei Primary School affiliated to Shanghai University of Traditional Medicine in May.


Primary school to use AI to monitor students

#artificialintelligence

Students who graduated from kindergarten welcome their up-coming study from September at the Minhang Qiangwei Primary School affiliated to Shanghai University of Traditional Medicine in May.


Researchers use AI to track students' performance in online courses

#artificialintelligence

What insights might be gleaned from an education platform that's entirely online? In a newly published paper on the preprint server Arxiv.org They say their method allowed for tracking changes in behavior among students over time, as well as trends in the broader educational system. "How students behave … is an important topic in educational data mining. Knowledge of this behavior in an educational system can help us understand how students learn and help guide the development for optimal learning based on actual use," wrote the coauthors.


How Artificial Intelligence Helps Tech Students In The Learning Process

#artificialintelligence

Artificial Intelligence is yet to become a standard in schools, but it has the potential to transform the educational field. It's is a technology whose time has certainly come because it can already outperform humans in many ways. However, it can be very helpful for tech students. Meeting the needs of each student becomes a must in today's classroom. For example, a teacher should create personalized tasks to fit the learning style of students and ensure that they enjoy the same access to learning.